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Abstract — The Hopfield network has been applied to solve 
optimization problems over decades. However, it still has many 
limitations in accomplishing this task. Most of them are inherited 
from the optimization algorithms it implements. The computation 
of a Hopfield network, defined by a set of difference equations, 
can easily be trapped into one local optimum or another, sensitive 
to initial conditions, perturbations, and neuron update orders. It 
doesn't know how long it will take to converge, as well as if 
the final solution is a global optimum, or not. In this paper, 
we present a Hopfield network with a new set of difference 
equations to fix those problems. The difference equations directly 
implement a new powerful optimization algorithm. 1 

I. Introduction 

In the beginning of the 1980s, Hopfield [1], [2] and his 
colleges published two scientific papers on "neuron" compu- 
tation. Hopfield showed that highly interconnected networks 
of nonlinear analog neurons are extremely effective in solving 
optimization problems. From that time on, people has being 
applying the Hopfield network to solve a wide class of 
combinatorial optimization problems (see a survey [3]). 

In a discrete-time version, the Hopfield network imple- 
mented local search. In a continuous-time version, it imple- 
mented gradient decent. Both algorithms suffer the local min- 
imum problem and many optimization problems in practice 
have lots of local minima. Furthermore, the Hopfield-Tank 
formulation of the energy function of the network causes 
infeasible solutions to occur most of the time [4], [3]. People 
also found that those valid solutions were only slightly better 
than randomly chosen ones. 

To guarantee the feasibility of the solutions, the most impor- 
tant breakthrough came from the valid subspace approaches of 
Aiyer et al [5] and Gee [6]. However, it requires researchers to 
design a constraint energy function to make solution feasible, 
add it to the original energy function, and recalculate the 
energy function to obtain new connection weights. It is not 
simple and is unlikely that biological neural networks also 
implement such a process. To escape from local minima, many 
variations of the Hopfield network have been proposed based 
on the principles of simulated annealing [7]. Three major 
approaches are Boltzmann [8], Cauchy [9], and Gaussian 
Machines [10]. In theory, simulated annealing can approach 
the global optimal solution in exponential time. However, it 
is not guaranteed and is very slow to make it effective in 
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practice. Like local search, it doesn't know how long it will 
take to converge. It also does not know if a solution is a global 
optimum so that the search process can be stopped. 

Those improvements make the Hopfield network com- 
petitive with conventional optimization algorithms, such as 
simulated annealing. However, it can not be more powerful 
than those algorithms because it is just the implementations of 
those algorithms using interconnected networks of computing 
units, such as neurons. Its capability is restricted due to 
the limitations of the network structure and the theoretical 
limitations of the optimization algorithms it implements. Those 
conventional optimization algorithms have both performance 
problems and convergence problems, far from satisfactory in 
solving problems in practice. For example, stereo matching 
is an important problem in computer vision, and one of the 
most active research areas in that field [11], [12], [13], [14]. 
Compared with many specialized optimization algorithms, 
such as graph cuts [12], [15], simulated annealing has the 
worst performance in both the solution quality and computing 
time. People in the computer vision community even do not 
need to put it in the comparison list in the evaluation of 
different optimization algorithms for stereo matching [11]. 

In this paper, we present a Hopfield network with a new set 
of difference equations to fix those problems. In solving large 
scale optimization problems in computer vision, it significantly 
outperform general optimization algorithms, such as simulated 
annealing and local search with multi-restarts, as well as 
specialized algorithms. 

II. Cooperative Neural Network Computation 

One of the most popular energy functions used in computer 
vision and other areas has the following general form: 

) = y^Cifai) + Cij(xi,xj) , (i) 

where each variable xi, for 1 < i < n, has a finite domain 
Di of size rrii (rrii = \Di\). C{ is a real- valued function 
on variable Xi, called the unary constraint on Xi and dj 
is a real- valued function on variable Xi and xj, called the 
binary constraint on Xi and Xj . The optimization of the above 
energy function is NP-hard. It is called the binary constraint 
optimization problem in computer science, a special case of 
constraint model where each constraint involves at most two 
variables. 



Without loss of generality, we assume all constraints are 
nonnegative functions through out this paper. Also, we focus 
on the minimization of the function {TJ because minimization 
and maximization are logically equivalent. 

A. The New Set of Difference Equations 

Following the Hopfield network formulation, we use a 
number of neurons, one for each value of each variable. If 
variable Xi has mi values, we have rrii neurons, one for each 
value. This set of neurons is called a neuron group. In total, 
we have rrii neurons organized as n groups for n variables. 

The state of the neuron is denoted as Ci(xi) for the value 
Xi of the ith variable. Because we are dealing with the min- 
imization problem, for the convenience of the mathematical 
manipulation, we make Ci(xi) > and use lower values to 
indicate higher activation. If q(^) = 0, the corresponding 
neuron has the highest activation level. 

Different from the Hopfield network, we use a new set of 
difference equations shown as follows: 

c f\ x i) = <r t w9i(xi), for x i £ A and 1 < i < n, (2) 

(k) 

where a f (k) is a threshold function with the threshold t\ ' . The 
threshold* function a t is defined as 

/ \ f x. if x < t; 
at ^ = \oo, ifx>t. 

and gi(xi) is defined as 

gi(Xi) = (1 - Xk)Ci(Xi) + \kw ii cf~ 1 \x i )^r 

^2 min ((1 - \k)Cij(xi,Xj) + \ k w ij cf~ 1 \x j ) ) j , 

where k is the iteration step, are non-negative weights 
satisfying Wij = 1. Parameter is a weight satisfying 

< A fe < 1. 

To ensure the feasibility of solutions, we follow the winner- 
take-all principle by using the threshold function to progres- 
sively inhibit more and more neurons in the same group. 
Eventually, we let only one neuron be active in one group. 
Those values that the remaining active neurons corresponding 
to constitute a solution to the original combinatorial optimiza- 
tion problem (Q. This approach is different from the one used 
by the Hopfield network. The Hopfield network adds a new 
constraint energy function to ensure the feasibility of solutions. 

This set of difference equations is a direct neural network 
implementation of a new optimization algorithm (detail in the 
following subsection) for energy function minimization in a 
general form. This algorithm is based a principle for opti- 
mization, called cooperative computation, completely different 
from many existing ones. Given an optimization problem 
instance, the computation always has a unique equilibrium and 
converges to it with an exponential rate regardless of initial 
conditions and perturbations. 

It is important to note that, different from the Hopfield net- 
work in a discrete-time version, our set of difference equations 
does not require to update one state at one time. All states for 



all values of all variables can be updated simultaneously in 
parallel. It is another advantage of this neural network over 
the classical Hopfield network. 

B. The Cooperative Optimization Algorithm 

To solve a hard combinatorial optimization problem, we 
follow the divide-and-conquer principle. We first break up 
the problem into a number of sub-problems of manageable 
sizes and complexities. Following that, we assign each sub- 
problem to an agent, and ask those agents to solve the sub- 
problems in a cooperative way. The cooperation is achieved by 
asking each agent to compromise its solution with the solutions 
of others instead of solving the sub-problems independently. 
We can make an analogy with team playing, where the team 
members work together to achieve the best for the team, but 
not necessarily the best for each member. In many cases, 
cooperation of this kind can dramatically improve the problem- 
solving capabilities of the agents as a team, even when each 
agent may have very limited power. 

To be more specific, let E(xi, X2, . . . , x n ) be a multivariate 
objective function , or simply denoted as E(x), where each 
variable X{ has a finite domain Di of size rrii (rrii = \Di\). 
We break the function into n sub-objective functions Ei 
(i = 1, 2, . . . , n), such that Ei contains at least variable X{ 
for each i, the minimization of each objective function Ei 
(the sub-problem) is computational manageable in sizes and 
complexities, and 

n 

E{x)=Y J E i {x). (3) 

i=l 

For example, a binary constraint optimization problem © 
has a straight-forward decomposition: 

Ei = d(xi)+ ^2 Cij(xi,Xj) for i = 1,2, . 

The n sub-problems can be described as: 

min Ei, for i = 1, 2, . . . , n , (4) 

xjEXi 

where Xi is the set of variables that sub-objective function Ei 
contains. 

Because of the interdependence of the sub-objective func- 
tions, as in the case of the binary constraint-based function (see 
Eq. ©), minimizing those sub-objective functions in such an 
independent way can hardly yield a consensus in variable as- 
signments. For example, the assignment for Xi that minimizes 
Ei can hardly be the same as the assignment for the same 
variable that minimizes Ej if Ej contains Xi. We need to 
solve those sub-problems in a cooperative way so that we can 
reach a consensus in variable assignments. 

To do that, we can break the minimization of each sub- 
objective function (see ©) into two steps, 

min min E^ for i = 1, 2, . . . , n , 

Xi Xj eXi\xi 

where Xi \ Xi denotes the set Xi minuses {x^. 



That is, first we optimize Ei with respect to all variables that 
Ei contains except x\ . This gives us the intermediate solution 
in optimizing Ei, denoted as ci{xi), 

c i( x i) — mm ^ for i = 1, 2, . . . , n . (5) 

Xj £Xi\xi 

Second, we optimize Ci(xi) with respect to Xi, 

mm Ci(xi) . (6) 

Xi 

The intermediate solutions of the optimization, Ci(xi), is 
an unary constraint on Xi introduced by the algorithm, called 
the assignment constraint on variable Xi. Given a value of 
Xi, Ci(xi) is the minimal value of Ei. To minimize Ei, those 
values of x^ which have smaller assignment constraint values 
Ci(xi) are preferred more than those of higher ones. 

To introduce cooperation in solving the sub-problems, we 
add the unary constraints Cj(xj), weighted by a real value A, 
back to the right side of (0> and modify the functions (0> to 
be iterative ones: 

4 k \xi) = ^ rnin^ ^(1- \ k )E i + \ k ^w ij cf~ 1 \x j ^ , 

(7) 

where k is the iteration step, Wij are non-negative weight 
values satisfying ^ = 1. It has been found [16] that 
such a choice of Wij makes sure the iterative update functions 
converge. The function at the right side of the equation is 
called the modified sub-objective function, denoted as Ei. 

By adding back Cj(xj) to Ei, we ask the optimization 
of Ei to compromise its solution with the solutions of the 
other sub-problems. As a consequence, the cooperation in 
the optimization of all the sub-objective functions (i^s) is 
achieved. This optimization process defined in is called 
the cooperative optimization of the sub-problems. 

Parameter A& in controls the level of the cooperation at 
step k and is called the cooperation strength, satisfying < 
Afc < 1. A higher value for X k in Q will weigh the solutions 
of the other sub-problems Cj(xj) more than the one of the 
current sub-problem Ei. In other words, the solution of each 
sub-problem will compromise more with the solutions of other 
sub-problems. As a consequence, a higher level of cooperation 
in the optimization is reached in this case. 

The update functions Q> are a set of difference equa- 
tions of the assignment constraints q(^). Unlike conven- 
tional difference equations used by probabilistic relaxation 
algorithms [17], cooperative computations [14], and Hopfield 
Networks [1], this set of difference equations always has one 
and only one equilibrium given A and Wij. The computation 
converges to the equilibrium with an exponential rate, A, re- 
gardless of initial conditions of c[°\xi). Those computational 
properties will be shown in theorems in the next section and 
their proofs are provided in [18]. 

By minimizing the linear combination of Ei and Cj(xj), 
which are the intermediate solutions for other sub-problems, 
we can reasonably expect that a consensus in variable assign- 
ments can be reached. When the cooperation is strong enough, 



i.e., A/e — > 1, the difference equations are dominated by the 
assignment constraints Cj(xj), it appears to us that the only 
choice for Xj is the one that minimizes Cj (xj ) for any Ei that 
contains Xj. That is a consensus in variable assignments. 

Theory only guarantees the convergence of the computation 
to the unique equilibrium of the difference equations. If it 
converges to a consensus equilibrium, the solution, which is 
consisted of the consensus assignments for variables, must be 
the global optimum of the objective function E(x), guaran- 
teed by theory (detail in the next section). However, theory 
doesn't guarantee the equilibrium to be a consensus, even by 
increasing the cooperation strength A. Otherwise, NP=P. 

In addition to the cooperation scheme for reaching a consen- 
sus in variable assignments, we introduce another important 
operation of the algorithm, called variable value discarding, 
at each iteration. A certain value for a variable, say Xi, can 
be discarded if it has a assignment constraint value, Ci(xi) 
that is higher than a certain threshold, q(o^) > ti, because 
they are less preferable in minimizing Ei as explained before. 
There do exist thresholds from theory for doing that (detail in 
the next section). Those discarded values are those that can 
not be in any global optimal solution. By discarding values, 
we can trim the search space. If only one value is left for 
each variable after a certain number of iterations using the 
thresholds provided theory, they constitute the global optimal 
solution, guaranteed by theory [16]. However, theory does 
not guarantee that one value is left for each variable in all 
cases. Otherwise, NP=P. This value discarding operation can 
be interpreted as neuron inhibition following the winner-take- 
all principle if we implement this algorithm using neural 
networks. 

By discarding values, we increase the chance of reaching 
a consensus equilibrium for the computation. In practice, we 
progressively tighten the thresholds to discard more and more 
values as the iteration proceeds to increase the chance of 
reaching a consensus equilibrium. In the end, we leave only 
one value for each variable. Then, the final solution is a 
consensus equilibrium. 

However, by doing that, such a final solution is not guar- 
anteed to be the global optimum. Nevertheless, in our ex- 
periments in solving large scale combinatorial optimization 
problems, we found that the solution quality of this algorithm 
is still satisfactory, significantly better than that of other con- 
ventional optimization methods, such as simulated annealing 
and local search [16]. 

C. Definitions and Notations 

In the previous sub-section, we choose Wij such that it is 
non-zero if Xj is contained by Ei. For a binary constraint 
optimization problem using the decomposition ®, it implies 
that we choose be non-zero if and only if Xj is a neighbor 
of Xi. However, theory tells us that this is too restrictive. To 
make the algorithm to work, we only need to choose (wij) nxn 
to be a propagation matrix defined as follows: 

Definition 2.1: A propagation matrix W = (wij) nxn is a 
irreducible, nonnegative, real-valued square matrix and satis- 



fies 



i=l 



= 1, 



for 1 < j < n 



A matrix W is called reducible if there exists a permutation 
matrix P such that PWP T has the block form 

A B 
O C 

Definition 2.2: The system is called reaching a consensus 
solution if, for any i and j where Ej contains Xi, 

argmini^ = argmini^ , 

X i Xi 

where Ei is defined as the function to be minimized at the 
right side of Eq. Q. 

Definition 2.3: An equilibrium of the system is a solution 
to Ci(xi), i = 1, 2, . . . , n, that satisfies the difference equa- 
tions 0. 

To simplify the notations in the following discussions, let 



M _ / (AO (AO (fc)\ 
— V c l > c 2 > • • • 5 c n )• 



Let x\ k ^ = argmin^. c\ K) (xi), the favorable value for assign- 
ing variable X{. Let x^ — {x^ k \x^ k \ • • • ,#n^), a candidate 
solution at iteration k. 

III. Theoretical Foundations 

A. General Properties 

The following theorem shows that c\ k \xi) for xi G Di 
have a direct relationship to the lower bound on the objective 
function E(x). 

Theorem 3.1: Given any propagation matrix W and the 
general initial condition = or Ai =0, Y^i c T\ x i) 
is a lower bound function on E(x±, . . . , x n ), denoted as 



That is 



< E(xi,X2, • • • ,x n ), for any fc > 1 . (8) 



In particular, let E*^ = ^Zc\ K) (xi), then E^} K) is a lower 
bound on the optimal cost E*, that is E*} k) < E*. 

*(k) 

Here, subscript "-" in E_ ' indicates that it is a lower bound 
on E*. 

This theorem tells us that ^c\ k \xi) provides a lower 
bound on the objective function E. We will show in the next 
theorem that this lower bound is guaranteed to be improved 
as the iteration proceeds. 

Theorem 3.2: Given any propagation matrix W, a constant 
cooperation strength A, and the general condition = 0, 
is a non-decreasing sequence with upper bound 

E*. 

If a consensus solution is found at some step or steps, then 
we can find out the closeness between the consensus solution 
and the global optimum in cost. If the algorithm converges to 
a consensus solution, then it must be the global optimum also. 
The following theorem makes these points clearer. 



*(A0 



Theorem 3.3: Given any propagation matrix W, and the 
general initial condition = or Ai = 0. If a consensus 
solution x is found at iteration step k\ and remains the same 
from step k\ to step fe, then the closeness between the cost 
of x, E(x), and the optimal cost, E*, satisfies the following 
two inequalities, 



< E(x) - E* < ( f[ X k j (E(x) - £r (fcl - 1} ) , 



(9) 



\k=k± 



< E{x) - E* < 



1 - nti^ x k 



(E* - E 



*(fci-i) 



) , (10) 



where (E* — E^ kl 1 " > ) is the difference between the optimal 
cost E* and the lower bound on the optimal cost, E_ 
obtained at step k\ — 1. When k2~k\^(X) and 1 — A& > e > 

for k ± <k< fe 2 , E{x) -> E*. 

B. Convergence Properties 

The performance of the cooperative algorithm further de- 
pends on the dynamic behavior of the difference equations (Q. 
Its convergence property is revealed in the following two 
theorems. The first one shows that, given any propagation 
matrix and a constant cooperation strength, there does exist a 
solution to satisfy the difference equations 0. The second part 
shows that the cooperative algorithm converges exponentially 
to that solution. 

Theorem 3.4: Given any symmetric propagation matrix 
W and a constant cooperation strength A, then Difference 
Equations have one and only one solution, denoted as 
(cf°\xi)) or simply c^°°^. 

Theorem 3.5: Given any symmetric propagation matrix W 
and a constant cooperation strength A, the cooperative algo- 
rithm, with any choice of the initial condition , converges 
to c(°°) with an exponential convergence rate A. That is 



(k) 



~(°o)i 



<A fe ||c 



(0) 



„(°°)i 



(11) 



This theorem is called the convergence theorem. It indicates 
that our cooperative algorithm is stable and has a unique attrac- 
tor, c(°°\ Hence, the evolution of our cooperative algorithm 
is robust, insensitive to perturbations, and the final solution of 
the algorithm is independent of initial conditions. In contrast, 
conventional algorithms based on iterative improvement have 
many local attractors due to the local minima problem. The 
evolutions of these algorithms are sensitive to perturbations, 
and the final solutions of these algorithms are dependent on 
initial conditions. 

C. Necessary Conditions 

The two necessary conditions provides in this subsection 
allows us to discard variable values that can not be in any 
global optimum. 

Theorem 3.6: Given a propagation matrix W, and the gen- 
eral initial condition = or Ai = 0. If value x\ (x* G A) 



is in the global optimum, then c\ (x*), for any k > 1, must 
satisfy the following inequality, 

cf \x*) < (E* - E*_ {k) ) + cf\xf ] ) (12) 

where ; is, as defined before, a lower bound on E* 
obtained by the cooperative system at step k. 

Theorem 3.7: Given a symmetric propagation matrix W 
and the general initial condition = or Ai = 0. If value 
x\ (x\ G Di) is in the global optimum, then c[ k \x*) must 
satisfy the following inequality, 

^(arj)<f + ] [^\aP\E* (13) 
Here a 2 is computed by the following recursive function: 

4 1 ' = ^1^2 + (1 - Ai) 

= A fe a2a^ _1) + (1 - A fe ) 

where a 2 is the second largest eigenvalue of the propagation 
matrix W. 

For the particular choice of W=^(l) nxn , 
4 k) = (1 - A*) 

and 

cf^^ + ^^l-A^*. (14) 

Inequality JT2l> and Inequality fT3l provide two criteria for 
checking if a value can be in some global optimum. If either 
of them is not satisfied, the value can be discarded from the 
value set to reduce the search space. 

Both thresholds in IT2l and (IT3l become tighter and tighter 
as the iteration proceeds. Therefore, more and more values can 
be discarded and the search space can be reduced. With the 
choice of the general initial condition = 0, the right hand 
side of (fT2l decreases as the iteration proceeds because of the 
property of E*^ revealed by Theorem 13. 21 With the choice of 
a constant cooperation strength A, and suppose W 7^ ^(l) nxn , 
then 0L2 > and {a^\k > 1} is a monotonic decreasing 
sequence satisfying 

/" A < a { 2 k) < (1 - A) + Xa 2 (15) 

1 — A«2 

This implies that the right hand side of dT3l monotonically 
decreases as the iteration proceeds. 

IV. Case Studies in Computer Vision 

The proposed algorithm has outperformed many well- 
known optimization algorithms in solving real optimization 
problems in computer vision[16], [19], image processing [20], 
and data communications. These experiment results give 
strong evidence of the algorithm's considerable potential. 

We provides in this section the performance comparison 
of the new Hopfield networks with cooperative optimization 
and Boltzmann machine network for stereo matching [11], 
[12], [15], [21]. The Boltzmann machine is simply a discrete 




Fig. 1. A pair of images for stereo matching. 




Fig. 2. The performances of neural networks with different dynamics for 
four real instances of stereo matching. The ground truth (left). The results of 
the Hopfield nework with cooperative optimization (middle). The results of 
the Boltzmann machine network (right). 

time Hopfield network in which the dynamic function of 
each neuron is defined by simulated annealing [7]. Simulated 
annealing is a well-known optimization method which is based 
on stochastic local optimization. 

Stereo vision is an important process in the human visual 
perception. As of now, there is still a lack of satisfactory com- 
putational neural model for it. To understand such an important 
process, people treat stereo vision as stereo matching. Stereo 
matching is to use a pair of 2-D images of the same scene 
taken at the same time but two different locations to recover 
the depth information of the scene (see Fig. [lj. 

Instead of using toy problems, we tested both types of 
neural networks with real problems. Four pairs of images 
including the one shown in Fig. [T] are used in our experiments. 
The ground truth, the depth images obtained by Boltzmann 
machine and by the new Hopfield network with cooperative 
optimization are shown in Fig. |2| Clearly, the results of 
new Hopfield with cooperative optimization are much cleaner, 
much smoother, and much better than the results of Boltzmann 
machine. 



Minimal Energies (xlO 3 ) 
Image Boltzmann Machine New Hopfield Network 



Map 


580 


329 


Sawtooth 


182 


143 


Tsukuba 


781 


518 


Venus 


197 


125 



TABLE I 

The minimal energies found by neural networks with 
different dynamics. 



Table U lists the minimal energies found by the two types of 
neural networks. Those found the new Hopfield network are 
much lower those found by Boltzmann machine. 

The iteration time is 16 for the new Hopfield network and 
100 for Boltzmann network. In each iteration, all neurons 
are updated once. On average, the new Hopfield network is 
three times faster than Boltzmann machine in our simulation. 
Another big advantage of the new Hopfield network over 
Boltzmann machine is its inherited parralism. In each iteration, 
all neurons in the new Hopfield network can be updated 
fully in parallel. This feature, together with the excellent 
performance of the new Hopfield network offer us commerial 
pontential in implementing stereo vision capability for robots 
and unmanned vehicles. 

V. Conclusions 

This paper presented a neural network implementation of a 
new powerful cooperative algorithm for solving combinatorial 
optimization problems. It fixes many problems of the Hopfield 
network in theoretical, performance, and implementation per- 
spectives. Its operations are based on parallel, local iterative 
interactions. The proposed algorithm has many important com- 
putational properties absent in existing optimization methods. 
Given an optimization problem instance, the computation 
always has a unique equilibrium and converges to it with an 
exponential rate regardless of initial conditions and perturba- 
tions. There are sufficient conditions [16] for identifying global 
optimum and necessary conditions for trimming search spaces. 
In solving large scale optimization problems in computer 
vision, it significantly outperformed classical optimization 
methods, such as simulated annealing and local search with 
multi-restarts. 

One of the key processes of cooperative computation is 
value discarding. This is the same in principle as the inhibition 
process used by Marr and Poggio in [14], and Lawrence and 
Kanade in [13]. The inhibition process makes the cooperative 
computation fundamentally different from the most known 
optimization methods. As Steven Pinker pointed out in his 
book "How the Mind Works", the cooperative optimization 
captures the flavor of the brain's computation of stereo vision. 
It has many important computational properties not possessed 
by conventional ones. They could help us in understanding 
cooperative computation possibly used by human brains in 
solving early vision problems. 
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